A model for the force stretching double-stranded chain molecules 
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We modify and extend the recently developed statistical mechanical model for predicting the 
thermodynamic properties of chain molecules having noncovalent double-stranded conformations, 
as in RNA or ssDNA, and (3— sheets in protein, by including the constant force stretching at one 
end of molecules as in a typical single-molecule experiment. The conformations of double-stranded 
regions of the chain are calculated based on polymer graph-theoretic approach [S-J. Chen and K. 
. A. Dill, J. Chem. Phys. 109, 4602(1998)], while the unpaired single-stranded regions are treated as 

self-avoiding walks. Sequence dependence and excluded volume interaction are taken into account 
explicitly. Two classes of conformations, hairpin and RNA secondary structure are explored. For 
the hairpin conformations, all possible end-to-end distances corresponding to the different types of 
double-stranded regions are enumerated exactly. For the RNA secondary structure conformations, 
a new recursive formula incorporating the secondary structure and end-to-end distribution has 
been derived. Using the model, we investigate the extension-force curves, contact and population 
distributions and re-entering phenomena, respectively, we find that the force stretching homogeneous 
chains of hairpin and secondary structure conformations are very different: the unfolding of hairpins 
is two-state, while unfolding the latter is one-state. In addition, re-entering transitions only present 
in hairpin conformations, but are not observed in secondary structure conformations. 
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I. INTRODUCTION 



In recent years, diverse mechanical manipulation techniques of single-molecule, such as optical tweezers, atomic force 
microscopy, and soft microneedles have been applied in probing basic mechanical, physical and chemical properties of 
' single biological macromolecule, such as proteins, nucleic acids, and molecular motors. The basic principle is to exert 
, force in the pN range on the studied molecules, and then force-extension (FECs) or extension-force curves (EFCs) 
7-H ' are recorded]!, |^, |3|, Many useful insights about biomolecules were obtained by analyzing these curves, going from 
[ the detailed elastic properties jl| to complex structure 

experiments || |6| |?], |^, ^ | . In experiments of unzipping of double-stranded DNA (dsDN A) , FECs show gross features, 
which reflect simply the local GC versus AT content along the sequence j5j |(|. While in stretching single-stranded 
DNA (ssDNA), very different EFCs have been observed: the EFCs recorded by Maier et al. showed extremely 
smooth[0; in contrast, Rief et al. found that ssDNA end-to-end distance (EED) suddenly changed during narrow 
force range. In recent unzipping shorter RNA experiment, except EED increasing abruptly, EFCs with intermediates 
were observed [91. 
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On the theoretical side, compared to the great effort in understanding the unzipping of dsDNA0, 0, HI, H, little 
attention has been paid to the RNA or ssDNA stretching problem 14, 15, 1(| 17, 18, [l9, 20 1. Montanari and Mezard 
discussed the EFCs of a homogeneous ssDNA sequence. Their model exhibited a second order phase transition from 
a collapsed globular state to an extended necklace- like phase jl4j. Zhou and Zhang pointed out the important roles 
of electrostatic and base- pair stacking interactions played in stretching ssDNA[[l7[ [L8 |. Midler et. al. considered how 
the random disorder sequence modifies EFCs characteristics^^. Gcrland et al. explored quantitatively how nuclei 
acids structure determines a broad range of FECs from very rugged to completely featureless. They found that apart 
from the entropy elasticity of the unpaired single strand |lQJ, there are two additional mechanism, the compensation 
effect and the contribution of suboptimal secondary structures smoothing the features of FECs (2Cj]. Though their 
results were discussed on the constant extension ensemble, it may be true for constant force ensemble fl2f. 

There are several reasons to attract increasing attention on the force stretching RNA or ssDNA. First, studies of 
stretching RNA may generate new insights into the RNA folding problem|21|, including folding pathways and the 
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energy landscape^). Unlike the simpler hairpin structure, such as dsDNA, RNA may fold into complicated branched 
structures, and the native state has to compete with a large number of suboptimal states. Exerting force on the 
molecule will drive the native state into metastable states, which might be seen from the FECs or EFCs. Second, 
force stretching provides a direct approach to measure RNA thermodynamic parameters, which are key in predicting 
RNA secondary structure^, ^l). In addition, the force stretching RNA is also an intriguing physical problem, e.g., 
how RNA secondary structures determine the outcoming EFCs|Q, and how to describe the RNA properties far 
from equilibrium induced by external force is still an open question Jl2[. Because the structure complexity of RNA 
is intermediate between dsDNA and protein, the exploration may also be valuable in understanding properties of 
dsDNA and protein[||. 

Although the theoretical models about force stretching RNA molecules jl4[ |l6|, |l8| , p0[ have been proposed, more 
realistic and detailed models are needed |2l|. We note that with a few except ions |20|, almost all models were restricted 
the study of homogeneous or completely random RNA sequences. Whereas stretching experiments showed that 
sequences affect EFCs dramatically j?], ||, In very recent experiment ]2l| , Dessinges et. al found that excluded- 
volume interactions play an essential role as stretching ssDNA, which were assumed to be negligible before. To take 
into account the two factors, in this paper we will modify and extend the statistical model of double-stranded chain 
conformations developed by Chen and Dill to constant force stretching ensemble p5| . We think that their model 
is suitable for our aim. First, the model retains a relatively high degree of realism, which the sequence dependence, 
nonlocal intrachain contacts, and the excluded volume interactions are accounted for explicitly. Second, the model 
is more "microscopic", i.e., only one parameter, base-pair contact energy is required. The entropy is calculated from 
the number of conformations directly. In addition, because the model can completely characterize the full energy 
landscape of secondary structure having chain lengths up to hundreds bases p2j, the exploration of how force changes 
the folding pathways or energy landscape of RNA is valuable. 

The organization of the paper is as follows. We first, in Sec P , simply overview the statistical model of double- 



stranded chain molecules developed by Chen and Dill. In Sec. Ill, we extend the models about two classes of chain 



molecules, hairpin and RNA-like secondary structures conformations to constant force stretching cases, respectively. 
Their corresponding EFCs, contact probability, population distribution and re-entering phenomena, and the difference 
are discussed in Sec. Section. [v| is our conclusion. Some numerical process are relegated to appendices. In the 
Appendix [a] we illustrate the calculation of EED with a special case in hairpin conformations, namely when the closed 
graph is of type 1. In the Appendix [b|, we simply describe how to get the EED distribution of "open" self-avoiding 
walk (OSAW) (self-avoiding walks involving no neighboring contacts) along one coordinate through extrapolating 
approach. 



II. THE DOUBLE-STRANDED CHAIN MODEL 

The details of the statistical model of double-stranded chain molecules can be found in Refs. and [^|. Here 
we give a brief review. 



A. Polymer graph theory 

The model is based on polymer graphs, diagrammatic representations of the self-contacts made by different chain 
conformations. Fig. |l| shows a hairpin conformation and corresponding polymer graph: vertices represent the chain 
monomers, straight line links symbolize the covalent bonds, and curved links stand for spatial contact between 
monomers. A given polymer graph represents an ensemble of chain conformations that are consistent with given 
contacts. Conformations having contacts other than those specified by the polymer graph belong to other graphs. In 
general, fewer curved links exist in a graph, a more larger number of chain conformations are consist with this graph. 
Any two pairs of curved links in a polymer graph must bear one of three relationships: nested, unrelated and crossing 
linked. Graphs of the double-stranded chain conformations involve no crossing links, examples of which include the 
simplest hairpin structure, such as dsDNA, and mainly secondary structures among nucleic acids and antiparallel 
/3-sheets in proteins. In terms of polymer graph, the partition function of a [N + 1) monomers ((N+l)-mer) chain 
molecule is given as a sum over all possible polymer graphs, 

Q N (T) = Y / V(r)e- E ^/ k * T , (1) 
r 

where ks is Boltzmann constant, T is temperature, T is an index of a possible polymer graph, -E'(r) and Q(T) are the 
energy and the number of conformations of the given polymer graph T, respectively. To calculate ^(r), Chen and 
Dill developed a matrix multiplication method^, p5[ . 
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FIG. 1: The polymer graph of hairpin conformations. The shadowed region is a face of the graph. The graph is di- 
vided into two parts: tow tails [s, so] and [lo,l], and one double-stranded region, or CG enclosed by outmost link (so,lo)- 
Here s and I are two end monomers in chain. The double-stranded region is composed of links nested each other: 
(so,Zo),- ■ • (sj-i,lj-i), (sj,lj), (sj+i,Zj + i). 



B. The matrix method for a given polymer graph 

A complex polymer graph can be divided into a series of faces consecutively, in which face is a region in graph that 
is bounded by curved and straight lines and contains no other edges; see the shadowed region in Fig. [l| Faces are 
classified into five types: left (L), middle (M), right (R), left-right (LR) and isolated (I), according to the arrangement 
of the nested curved links that bound the face. The calculation of the full partition function O(r) for a given graph 
r is correspondingly separated into two steps: to count all conformations for each face, as if they are isolated and 
independent of each other, then to assemble these conformations into Q(T). To avoid that conformations of different 
faces bump into each other (excluded- volume interaction), more detailed information about the conformations of 
faces is needed. On two dimension (2D) lattice, it is realized by exact enumeration. First, conformations of each 
face are classified into sixteen types according to the shape of ports (inlet and outlet) through which it is connected 
to other faces; see Fig. ||. Then the compatibility between neighboring faces can be checked exactly through the 
spatial compatibility between the outlet of one face and the inlet of next face. It is convenient to introduce two 
matrices: the Face Count Matrix (FCM) St, which matrix element (St)jj is the number of conformations having an 
inlet conformation of type i (1 < i < 4) and an outlet conformation of type j (1 < j • < 4) for a type t face, and 
viability matrix Y tlt2 , which element (Y tlt2 )y is 1 or if connection between type i outlet of a type t\ face and type 
j inlet of a type i 2 face is viable or not viable. For a hairpin graph T having M faces, Q(T) is derived as a product 
of matrices: 

fi(T) = U ■ S tM ■ Y tMtM _ 1 ■ ■ • • S tl ■ U 4 , (2) 

where U = {1,1, 1,1}, U' is the transpose of U. 




type 1 type 2 type 3 type 4 

FIG. 2: The four types of port (inlet or outlet) shapes on 2D lattice. 



C. The partition function 

In order to calculate the partition function of a whole chain using Eq. the sum over all possible polymer graphs 
is necessary. For the double-stranded conformations, an efficient dynamic programming algorithm was developed in 
Rcf . [^5| . The idea is to start with a short chain segment and elongate the segment by adding one monomer for each 
step, and then to calculate recursively the partition function of the longer segment using the result of the shorter one. 
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The algorithms will be discussed as needed in following sections. More useful alternative expression of Eq. [I] is 

Qn{T) =Y,9N{E)e- E l k * T , (3) 

E 

where gN(E) is the density of states, or the number of conformations having energy E, which is defined as 

9Jv(£)=£fi(r)| B(r)=i3 . (4) 



III. FORCE STRETCHING DOUBLE-STRANDED CHAIN MOLECULE: THE HAIRPIN AND RNA 

SECONDARY STRUCTURE CONFORMATIONS 




FIG. 3: Sketch of the force stretching experiment on the 2D lattice. The larger dark points represent two ends of a chain 
molecule. One end is fixed by a pin, and another is stretched by a constant force F = /x along a given unit vector x . The 
monomers are denoted to be small dark points. The average extension x is recorded as a function of /. 



Fig. |^ depicts the situation studied in this paper: one end of double-stranded chain molecule is fixed by a pin, and 
a constant force F = /x Q is exerted on the other end, where x Q is a given unit vector; average extension x along x D 
is recorded as function of /. To include the contribution of force stretching energy, we extend the partition function 
Eq. || to Qn{T; f) at given force / as 



(5) 



E A 



where A is EED of the chain along x D , and grj(E; A) is the number of conformations whose energy and EED are E 
and A, respectively. Define an auxiliary partition function Qpj(E; f) as 



g N (E;f) = J29N(E;A)i 



A/A 



Then instead of Eq. [|, we have 



Q N (T;f) = J2^(E;f)e-> 3E . 



(6) 



(7) 



Apparently, the following relations are satisfied: J2&gN(E; A) = g^(E) or Qjy(E;f = 0) = gN{E). The measured 
extension x at force / is calculated from partition function as 



x = k B T— log Q N (T;f). 



(8) 



The calculation of gN(E; A) is key in force stretching problem. Due to the new parameter, EED A, the dynamic 
programming algorithm pq| must be modified and extended carefully. In following sections, we show how to compute 
gi\[(E; A) of hairpin and RNA secondary structure conformations respectively. 
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A. Force stretching hairpin conformations 



As one of the simplest elements in secondary structure, hairpin exists in a large class of biomolecules, such as 
RNA hairpin, peptide /5-hairpin and dsDNA hairpin. Recent works showed that the hairpin conformations have 
remarkable thermodynamic and kinetic behaviors p5|, p6| . One may hope that they also have remarkable mechanical 
characteristic. On the other hand, more precise formula and the essence of theory for complex RNA secondary 
structure conformations make us to explore them stretched by force independently. 

The polymer graphs of hairpin conformations are that every curved link bears a nested relationship with respect 
to every other curved link. The polymer graph is composed of two parts: two non-self-contacting tail chains (s, Sq) 
and (lo,l) and one double-stranded region (so,?o)> which is defined as Closed Graph (CG)p5|; see Fig. @. The 
number of conformations for a given graph without force equals a multiplication of the number of double-stranded 
conformations and the number of two tails conformations. In terms of four types of the outermost faces, the graphs 
are classified into four types. (LR type is excluded from hairpin conformations). To sum over all polymer graphs, 
two matrices, the Closed Graph Count Matrix (CGCM), G*t [E, so, lo] an d diagonal matrix 10 [s, sq; lo, 1} have been 
defined: (G* t [E,s ,l ])ij is the sum over the number of conformations for all possible t type graphs having energy 
E, given that the outmost link spans from vertex l to vertex sq, and innermost and outermost links are in i and j 
types conformations; diagonal matrix element (lo [s,sq;Iq, I]) a is the number of conformations of two tails (s, sq) and 
(Zo, Z) that are spatially compatible with type i conformations. State density gpf(E) then can be written as: 



9n(E) =U 



s ,lo,t 



co[s,s ;l ,l] ■ G* t [E,8 ,l ] ■ U* 



(9) 



here 1 < t < 




e (x,y) 




(A) (B) 

FIG. 4: (A) Sketch of the force stretching hairpin conformations. Here the double-stranded region contributes +1 to EED of 
the whole chain. (B) Illustration of how the conformation ended at (x,y) is distributed to whole lattice plane by eight square 
symmetric transformations, where the shadowed circles represent the double-stranded region. 



When force is nonzero, g?j(E) is extended to gjsr(E;A). u[s, sq;Iq,1] is also nondegenerated into u>[s, sq', lo, J|A], 
From Eq. |^, we note that force stretching work just depends on EED of the chain, which does not involve any detailed 
double-stranded structure directly, i.e., only the outmost link (sq, Iq) contributes ±1 or to EED A of the chain along 
x direction; see Fig. |]. Though, in real nucleic acid, a typical distance of a hydrogen bond is about 3 times than the 
nucleotide distance, it does not make sense for our description on a coarse-grained level. We can exactly enumerate 
EED of the whole chain with three steps: first, to fix one conformation of a given CG on lattice and grow two tails 
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which are compatible with the graph type; then, to enumerate EED of the two tails and combine them with EED of 
the graph according to force direction; and finally, to distribute the full conformation to lattice plane by eight square 
symmetry transformations (D4 group). We define a new diagonal matrix II [s, sq; lo, l\f], 

II [s, SQ ; l , l\f] = 5> [s, s ; l , l\A] e" A . (10) 

A 

So we obtain 

Q N (E; /) = U • 53 n [a, s ; l ,l\f] ■ G\ [E, s , l ] • U*. (11) 

s ,l ,t 

The basic idea is very simple, however, it is very cumbersome to complete this process. Rather than to give a 
detailed analysis here, we illustrate the process in Appendix [a| by a special case. 

Though the enumeration method gives quite accurate EED of the hairpin conformations, unfortunately, as applied 
to longer molecule chain, it is hard to give exactly analytical formula about n* (JV; x) and n* (iV; y), which are defined 
in Appendix |A| A more practical approximation is proposed in next section. 

B. Force stretching secondary structure conformations 

Secondary structure can be seen as a tree-like structure into which four basic structure elements, helix, loops, 
bulges and junctions compose through self-similarity arrangement. Each graph of secondary structure conformations 
is divided into two levels: the first lever is a combination of unrelated double-stranded regions connected with single- 
stranded chains; the second level is that each CG may be viewed as an independent secondary structure except that 
two end monomers of the region contact. We first simply review how to calculate the state density g(E) without 
force. Necessary definitions are introduced. 

To be different from hairpin conformations, the calculation of full state density of secondary structure is more 
complex. First all possible CGCMs, G* t [E,a,b] (Note that LR-type faces are included) are computed, where (a, b) 
is the outmost link of the CG. Any CG is composed of smaller unrelated subclosed graphs, auxiliary matrices, 
~K.t[E,l,a,b] counting a combination of conformations of all subgraphs are introduced, where cyclelength I is total 
number of monomers of the single-stranded chains in [a, 6] [p5| . CGCMs can be obtained by multiplication of matrix 
~K.t[E,l,a,b] and the number of conformations of the single-stranded part. Then in order to combine conformations 
of unrelated graphs into whole, matrices Gt [E,s,a] having energy E for full polymer graph [s,a] arc defined, where 
t = 0,1,2 represent three full graph types which are classified depending on whether 0, 1 and 2 existing links are 
connected to the rightmost monomer, see Fig. [| Their element (Gt [E,s,a])ij is the number of conformations for 
graphs in which the outmost link (of rightmost subgraph) and the innermost link (of the leftmost subgraph) are in j and 
i conformations, respectively. All matrices mentioned above are calculated by dynamics programming algorithm p5|. 
The full density of states of the secondary structure conformations is written as Q 

2 

g N (E) = U-Y,G t [E,s,l]-U t . (12) 
t=o 

When force is exerted on the chain, exact knowledge about EED is required. In principle, we can generalize the 
enumeration method in hairpin case to secondary structure: to enumerate all EEDs of each single-stranded chains 
separated by CGs, and to combine them with contribution of the CGs, ±1 and to get the full EED. Apart from several 
special cases, e.g., only one or two CGs allowed in polymer graph, however, the approach is not practicable for general 
graphs from the experience in counting EEDs of hairpin conformations. We propose an alternative method. Because 
that the detailed structures of the CGs are not involved into the whole EED directly, CGs can be viewed as effective 
covalent bonds connecting left and right parts of conformations. Hence, the secondary structure conformations are 
identified into OSAWs with the reduced monomers; the effective bonds bear the excluded volume interaction of CGs 
with other units in the chain. The EEDs are computed by enumeration and extrapolation method. Numerical results 
of the latter can be found in Appendix [b|. Considering that conformations of monomers neighboring CGs are almost 
"frozen" by excluded volume interaction, it may be more reasonable to include more monomers into the effective bonds 
such as illustrated in Fig. |5[ In present paper the simplest case is discussed. Though the effective chain approach 
(ECA) may overestimate the number of conformations for partially considering excluded volume interaction, it is still 
valuable before better methods are given. 

To realize the ECA, we modify the matrices Gt [E, s, a] to Gf [E,n, s, a], where new parameter n is the number 
of effective chain monomers. The recursive relations in Ref. (also see Fig. ^) then are extended into following 
relations: 
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e o • • o o • • • • 

a b c d 1 

FIG. 5: Illustration of how unrelated closed polymer graphs are reduced into effective chain to calculate EED of the chain. 
For example, two CGs [a, b] and [c, d] in upper are replaced by two bonds in below, and length of the chain is reduced to 
(a — s) + (c — 6) + (I — d) + 2 from I — s. Considering that excluded volume interactions between CGs and single-strand parts 
make conformations of the neighboring monomers of CGs to be "frozen", it may be reasonable to reduce chain length more, 
showing here by dashed brackets, and effective length is (a — s) + (c — b) + (I — d) — 2 now. 





s a a-1 s a-1 a 





s as asb asb a 



FIG. 6: the recursive relations for matrices Gt[E, n, s, a], where t = 0,1, 2. 



Go[E,n,s,a] = Gq[E, n — 1, s, a — 1] + 
Gi[£,n,s,o] = 5 n s G* t [E,s,a] 

t=L,M,I 
0<b<a Ei 

+ E J2G 1 [E-E u n 

0<b<a E 1 

G 2 [E,n, s,a] = <5„ 4 ^ G *ti E , s > a } 

t=R,LR 

+ E J2G [E-E u n 

0<b<a El 

+ E J2G 1 [E-E u n 

Q<b<a Ei 



i[£7,n-l,s,a-l], (13) 
l,s,b] G *t[Ei,b,a] 

t=L,M,I 

l,s,b] G* t [E u b,a], (14) 

t=M,I 

l,s,b] G *t[Ei,b,a] 

t=R,LR 

l,s,b]G* R [E u b,a]. (15) 
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Correspondingly, gN(E) is extended to Gn[E; /] as 

N 

ri=l m t 

where C x {n\m) is the number of conformations of n-step OSAWs whose final a; coordinates are m. Eq. [l6] clearly 
separates contribution of the unrelated CGs from the single-stranded parts. 

Because hairpin is one of four elements in secondary structure, EC A should be suitable to it. We also note that 
exact enumeration still is available when the lengths of two tails are smaller than a given value (27 in this paper), a 
mixture of two methods is applied in hairpin case. We believe that it is reasonable, since as CG region (s Q , l ) is so 
large, or tails are so short that tails conformations almost are "frozen" by excluded-volume interactions, enumeration 
method may be accurate. While the lengths of tails are relatively longer than the CG size, or the excluded interactions 
between CG and tails is weaker, the ECA may be available. 



C. Test of ECA without force 

We compute the density of states for hairpin and secondary structure conformations respectively. Comparing with 
Chen and Dill results, we find that our approximation gives considerable consistent results; see Figs. 0. The results 
can be tested against exact enumeration of shorter chains, which have been showed in Figs. 9 and 14 in Ref. p5|. 




5 10 15 20 25 30 35 2 4 6 S 

number of contacts number of contacts 

(a) (b) 

FIG. 7: (a) Comparing density of states for hairpin conformations calculated by enumeration p5|] method and its mixture with 
ECA, where the symbols come from enumeration method, and the different lines represent results from the latter. The energy 
function used is E = —ex (number of contacts) (e > 0). The chain monomers increase from 20-mer to 70-mer per 10-mer. 
Two methods give almost the same results, (b) Comparing density of states for secondary structure conformations calculated 
by Ref. [[^5| and ECA, where symbols are from the former, and the different lines represent results from ECA method. The 
monomers change from 12-mer to 19-mer. We find that our approach gives good approximation. 



IV. TESTS OF THE MODEL: EFCS, POPULATION DISTRIBUTIONS AND RE-ENTERING 

PHENOMENA 

We have described above how the statistical mechanical model of double-stranded chain molecules is extended 
to force stretching problem. In this section, we make use of the model to explore EFCs, contact and population 
distribution, and re-entering phenomena. To account for specific monomer sequences, sequences are divided into two 
classes: homogeneous and RNA-like chain. Each contact of the homogeneous chain contributes a sticking energy — e. 
While RNA-like chain has a specific sequence of four types of monomers: A, U, C and G, resembling the 4 types 
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of bases of an RNA; only A-U pair or C-G pair contributes an sticking energy — e. In following, we take the lattice 
spacing b. 

A. Extension-force curves 

We first investigate how temperature T and monomer number (N + 1) affect EFCs of homogeneous chains restricted 
to hairpin and secondary structure conformations, respectively; see Fig. We have following results: at given 
temperature, EFCs of any class conformations have similar shapes for any Af-value; shapes of the EFCs become 
smooth when temperature increases. In particular, extensions in hairpins increase sharply to full length during a 
narrow force range when force reaches critical value such as / = 0.48e/6 at T — O.le/fcs. It is a typical all-or-none 
behavior. Whereas at the same temperature, extensions in secondary structures increase at force 0.48e/6, but they 
reach the full length slowly and smoothly until force is about 0.75e/6. In addition, the EFCs of hairpins are more 
stable with temperature than that of secondary structures. Generally, the dramatical jumps in hairpin EFCs show that 
contacts in this conformations have higher cooperativity. In contrast, for secondary structure conformations, there is 
a more larger number of suboptimal states, by which dramatical EED changes may be absorbed; lower cooperativity 
is reflected from smoothness of EFCs. The same reason may also explain the different thermal stability of EFCs with 
temperature observed in different structure restriction. 

To illustrate effects of specific monomer sequence to EFCs, we compute EFCs of biomolecule sequences P5ab, 
P5abcAA, and P5abc; see Fig. ||(a). These sequences come from force unfolding RNA experiment studied by Liphardt 
et al. recently Thought we do not attempt to fit experimental EFCs in this paper, EFCs of secondary structure 
conformations derived from our model do reveal similar information with observed in experiment: at T = OAe/ks, 
unfoldings of P5ab and P5abc start at different forces; EFC of P5abc has an apparent intermediate, i.e., after a large 
jump without intermediates at / = 0.37e/& there follows a inflection. 

We also calculate EFCs of random, alternating and designed sequences; see Fig. |9|(b). The designed is composed of 
four bases, A, U, C and G arranged by A ■ ■ ■ ACCCCU ■ ■ ■ UC ■ ■ ■ CAAAAG ■ ■ ■ G, where the dots represent 15-mer 
A, U, C and G consecutively. Comparing the EFCs of the artificial sequences with curves got before, we find the 
EFC of the random sequence is very similar with the EFCs of biomolecular sequences in Fig. |^(a); the EFC of the 
alternating sequence is almost the same with the results of homogeneous chain of secondary structure; while the EFC 
of the designed sequence like the EFCs of homogeneous hairpins. From designed sequence arrangement, we think that 
the chain prefers to form two hairpins simultaneously. So if the force can unzip one of two hairpins, it is also large 
enough to unzip another at the same time, which can be reflected from the large jump of the extension from small to 
full length abruptly. 

B. Monomer-monomer contact distribution and population distribution 

Our model not only can predict EFCs of different sequences, but also it relates the EFCs with structures. We 
calculate contact distributions p(i,j;T,f) for every possible contact pair (i, j)|p5fl. The equilibrium p(i,j;T,f) is 
determined by the ration of the conditional partition function Qiv(iii; T, /) for all the conformations that contain 
contact and the full partition function Qjv(T;/) defined in Eq. g, p(i,j;T,f) — Qn(i, j',T, f)/ Qn(T; f). From 
the contact distribution, the structure of molecule is derived at given force and temperature. As an illustration, 
density diagrams of the distributions of the designed sequence restricted to secondary structure are shown in Fig. 
It is apparent to see that two hairpins almost present or disappear simultaneously according to the force values, 
smaller or larger than 0.42e/6. 

Although contact distribution provides the information of chain structure, it is insufficient to see globally how 
force unzips complex chain molecules, or what role cooperativity plays. In thermal melting RNA, energy landscape 
F(n, nn) was used to provided insights into cooperativity for the melting transition, where n and nn are the number 
of native contacts and non-native contacts respectively [^J. For our purpose, the EED A and total contacts q of 
conformations are taken as "order parameters" . Instead using free energy surface, we define population distribution 

p(A,q;T,f) = ) Q N (A,q;TJ), (17) 

where Qat(A,q;T, /) is conditional partition function for all conformations whose EEDs and the number of contacts 
are A and q. The probability p(A, q; T, f) carry the same information as the free energy. As an application, the 
population landscapes of the designed sequence with six typical forces at the same temperature 0.37e/kB are shown 
successively in Fig. O. At a force smaller than 0.33e/&, virtually all conformations populate the "native" structure, 
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force f (E/b) 
(a) 



force f (e/b) 



(b) 




0.1 0.2 G.3 0.4 0.5 0.6 0.7 0.1 0.2 0,3 0.4 0.5 0.6 0.7 

force f (e/b) force f (e/b) 



(c) (d) 

FIG. 8: The EFCs of homogeneous chains restricted to secondary structure conformations: (a) and (c), and hairpin confor- 
mations: (b) and (d). EFCs of hairpin case have first-order-like behavior being similar with unzipping of dsDNA. In contrast, 
EFCs of secondary structure change slowly and smoothly. 



in which two hairpins coexisting. Increasing force above 0.33e/6, the conformations separate into two classes: one 
class is that conformations are still two hairpins coexisting; the other class is that only one hairpin exists in all 
conformations. Increasing force further (Fig. |ll](d)), three classes of conformations present: except two mentioned 
above, the unfolded or full stretched conformations form a new class. Finally Fig. |l|(f) shows that all molecules are 
full unfolded as force is larger than 0.43e/fe. 

According to Eq. ||, the derivative of extension x with respect to force / can be expressed as d < x > /df = (< x 2 > 
— < x > 2 )/1cbT. The formula is almost the same with the heat capacity C(T), except that extension x is replaced by 
energy, and force is replaced by temperature. The analogy of heat capacity and EFCs can give lots of insights. E.g., 
in thermal melting proceeding, because the peaks in C(T) curves mark molecular structure transitions, the sudden 
increases in EFCs might be also understood as structure transitions driven by external forces, while the counterparts 
of the melting temperatures at peaks in C(T) are the "melting" forces at large jumps in EFCs. Based on the same 
reason, conformations between two extension jumps could be called in "intermediate" states like in thermal melting 
case, such as predicted in Fig. ^| and observed in experiment |§ about P5abc. So, just like the important roles of 
energy played in heat capacity concepts, we believe that EEDs may play the similar roles in force stretching problem, 
at least in nucleic acids. The most direct proof is that the population distribution p(A, q; T, /) of EED and energy in 
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force f (E/b) 



(b) 



FIG. 9: The EFCs of heterogeneous secondary structure chains: (a) the sequences are 49-mer P5ab, 64-mer P5abcAA and 
69-mer P5abc. (b) 70-mer sequences are random, alternating and designed. Two colors represent EFCs at different temperature. 
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FIG. 10: The density diagrams for contact probability p(i, j;T, /) of the designed sequence restricted to secondary structure 
conformations, where forces are 0.25 and 0.45e/6 at temperature O.le/kB- 



conformations are closely-related, as is showed in Fig. [ll|. 

The corresponding of the EED and energy is not needless. To show its usefulness, we explore the EED distribution 
of the chains of hairpin and secondary structure conformations. The EED distributions p(A; T, /) can be given by a 
sum over contacts q of p(A, q; T, /), 

p(A;T,/) = £>(A,g;T,/). (18) 
<? 

Fig. [l2| is the distributions of the 70-mer homogeneous double-stranded chains: for the secondary structure conforma- 
tions, only one maximum is observed at any force; while in the hairpin conformations, at some forces, here near the 
melting force 0.46e/6 at temperature 0.25e/fcB, two distinct populations of shorter and longer extensions present, but 
no conformations with other lengths in between. We name two structure transitions as "one-state" and "two-state" , 
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respectively. The transition definition is borrowed from thermodynamics of polymers, in which the distribution is 
about molecular microscopic energy [p8|. The one-state and two-state transitions are resemble a critical point and 
"first-order transitions" in thermodynamic limit. The difference of EED distributions of two classes conformations 
warns that the simple EFCs may cover critical physical informations. Like the importance of the width of melting 
peaks in determining thermal transitions type, the third-order derivative of the average extension with respect to 
force is essential to determine the transitions types in the force "melting" double-stranded chain molecules. 

To be different from ordinary polymers, mechanical properties of biomolecules are more complex. The main reason 
is the effect of monomer sequence[|l2[ [l5| Hence, the types of structure transitions driven by force is inevitable to 
be affected by sequence. The apparent example is the EED distribution of the designed sequence, which can be see 
explicitly from Fig. |ll|: at temperature 0.37e/kB, three isolated peaks in the distribution at force 0.38e/fe, two peaks 
at force 0.36 and OAOe/b. In fact, EFC of the sequence at the temperature seems very trivial; the curve is only more 
smoother than the EFC at O.le/b. 

C. Re-entering phenomena 

Finally, we explore the extension dependence on temperatures as external force fixed. The studies are related with 
the phenomena about re-entering transitions happened at low temperature in stretching dsDNA case ^7|. It 
means that if one fixes external force at a value in a finite range and decreases temperature, beginning with stretched 
state, chain molecule will first collapse to a globular state; while temperature is lowered further, chain will re-enter 
stretched state again. One can judge the existence of the phenomenon from extension-temperature curves directly, 
i.e., there are dips at lower temperature on the curves for some force values. Recently Midler pointed out that the 
re-entering phenomenon also present in force stretching RNA molecules |l5||. It is interesting to see whether our model 
can predict the transitions. We check it by numerical computing for homogeneous chains of secondary structure and 
hairpin conformations respectively. These results are shown in Fig. [1^, The numerical data demonstrate that the 
re-entering transitions only present in hairpin conformations, but are not been found in secondary structure. 

V. SUMMARY 

In this paper, we modify and extend the statistical model of double-stranded chain molecules developed by Chen 
and Dill to force stretching case. Unlike several theoretical models proposed previously, specific monomers sequence 
and excluded volume interaction are exactly included. We use our model to investigate how EFCs depend on chain 
length, sequence, and structure. In addition, our model can relate EFCs with molecular structure directly. And we 
also investigate re-entering phenomenon in hairpin and RNA conformations. Because our aim in this paper is just to 
illustrate, our model is largely simplified, e.g., the chain is restricted on 2D lattice, the chain stiffness (bending) and 
elasticity are neglected, and no stacking interaction is involved which may be important in force stretching theory [jl8|. 
However, our results still reveal many important and interesting physical results. In future work, more realistic 
requirements, such as eliminating of lattice restriction^] and adding of stacking energy could be considered. 

Acknowledgments 

We are indebted Dr. Shi-jie Chen, Tao Xu, and Prof. H.-W. Peng for many helpful discussions in the work process. 

APPENDIX A: CALCULATING END-TO-END DISTANCE DISTRIBUTION OF HAIRPIN 

CONFORMATIONS 

For hairpin conformations, the conformations of tails are determined by two factors: the outermost link type and 
the length of CG. To account for excluded interactions, the tails have been classified into two types, stiff (s) and flex 
(f)p^j. Correspondingly, the number of conformations is n^iVj), where (s or f) are the type of ith (1st or 2nd) tail 
with length Ni, or A^-step OS AW. In order to calculate EEDs of whole chain , we introduce additional definitions: 
n^(Ni;m), the number of conformations for ith OS AW of type U whose final x coordinates are m; and n^j(Ni;m) 
is similar except y coordinate. It is ease to find n ti (N i ) = Yl m n x(^i' m ) = E ra n j (^; m )- Here we illustrate the 
calculation of diagonal matrix element Tl(l, Iq; sq, s)n when the outmost link is of type 1. 
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FIG. 11: The population landscapes p(A, q;T, f) of the designed sequence restricted to secondary structure conformation, 
where forces in unit e/b, the temperature 0.37e/fcs, and the coordinate from to 70 (x-axis) is the EED A, while another 
coordinate (y-axis) is the contact q. 



1. s = so, I la 



In this case, the length of the 2nd tail is N2 = I — l . There are six classes of possible conformations for CG-tail 
complexes [^| . In Fig. [li], we show one of them. Since the length of CG also affects exclude volume interaction, we 
distinguish different CG sizes. 
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(b) 



FIG. 12: The EED distribution p(A; T, /) of the homogeneous chains of secondary structure (a) and hairpin conformations (b), 
where different forces are in unit e/b, and the temperature is 0.25e/fcs. At force 0.46e/6, two isolated peaks appearing in the 
distribution represent two-state transition in hairpin conformations. 




FIG. 13: The extension-temperature curves for 25-mer chains of secondary structure (a) and hairpin conformations (b) at 
different force values, where the sequences are homogeneous. Arrow in (b) points out dip minimas, which demonstrates the 
appearance of re-entering transitions in hairpins. 
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FIG. 14: One class of conformations for the 2nd tail (l , 1) and CG (so, lo) containing outmost type 1 link, where the tail type 
is stiff and extends upward |25[| . The force stretching end I located at (x',y') on frame O' is translated to point (x',y' — 1) on 
frame O, and the corresponding EED A is x' . Then we distribute the conformation to lattice plane by eight square symmetry 
transformations (see Fig. ^]). These EEDs are tabulated respectively, where the character "T" represents transformation 
elements. 



(1) l — s a = 3. In this case, all six classes of conformations are viable. Therefore, 

U[s,s ;l ,l\f] u = 2x2 ^ n^ 2 ;y) (cosh + cosh /3/(y+l)) 

- v 

+ n l 2 ( N 2;x) (cosh (3 fx + cosh/3/(x + 1)) 

X 

+ E n v ( N *v) ( 2 cosh ^ + cosh/3/(y + 1) + cosh/3/(y - 1)) 
v 

+ ^ n s x 2 (N 2 ; x) (2 cosh (3 fx + 2 cosh (3f(x + 1)) 

X 

- (1 + cosh/3/ + cosh/3/7V 2 + coshf3f(N 2 + 1))] , (Al) 

where one of coefficients 2 is degeneracy degree along force direction, and another is from combining exponentials 
to hyperbolic cosine function, the negative part is to eliminate the straight tail conformations, which are counted 
repeatedly. 

(2) l Q — s a > 3. There are five classes of conformations are involved. We have 

n[ S , S0 ;Wl/]n = 2x2 ^n^(iV 2 ;y)(cosh ( 9/y + cosh/3/(y + l)) 

- y 

+ J2n f x 2 (N 2 ;x) (coshf]fx + cosh(3f(x + l)) 

X 

+ E n v ( N *V) ( cosh ^ + cosh/3/(y + 1) + cosh 0f(y - 1)) 
y 

+ ^ n s x 2 (N 2 ; x) (2 cosh/3/x + cosh(3f{x + 1)) 

X 

~(cosh(3f + cosh (3 fN 2 )}. (A2) 
2. s / so, / / lo 

A large number of CG-tail complexes have been listed in Ref. p5|. Since there are two tails, we define Ni = s a — s 
and N 2 — I — l a . According the length of CG part, we then have following results. 
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(1) l a — s Q = 3. This case is more cumbersome than above cases. We divide H[s,Sq;IoJ\ f]n into the sum over 
11^ , j = 1, • • • , 5 such as 

5 

n[ s , So ;Jo,z|/] 11 = 2x2^n« (A3) 

where 

n(1) = EE<( iY i'yi) n y 2 ( 7V 2;2/ 2 )(2cosh/3/(y 1 +j /2 )+cosh/3/(y 1 -y 2 )) 

VI V2 

+ EE n v (^i: »iK a (^2^2) (2 cosh^/d/! + x 2 + 1) + cosh/3/( 2/1 - x 2 - 1) 
+ cosh/?/ (3/1 + x 2 ) + cosh/3/(yi - x 2 )) 

+ E E < (^i^K 2 (^2; 2/2) (2 cosh /?/(*! + 2/2 + 1) + cosh/?/^ - y 2 + 1) 

Xl J/2 

+ 2 cosh/3/(xi + 2/2) + cosh/3/(xi - y 2 )) 

+ Y / ^n s x 1 (N 1 ;x 1 )n s x 2 (N 2 ;x 2 ) (3coshf3f(x 1 +x 2 + 1)) , (A4) 

Xl x 2 

n(2) - ^^<(^Vi;yiK 2 (iV 2 ;y2)(cosh/3/(2/ 1 + 2 /2 )) 
2/1 V2 

+ J2J2 n v (^1; »i)"x (^2; a*) (cosh/?/(?/i + x 2 + 1) + cosh/3/(yi + a; 2 ) 

j/i 12 
+ cosh/3/(yi - x 2 )) 

+ E E ( N i-> x i) n v (^2; 2/2) (2 cosh/3/(x 1 + 2/2 + 1) + cosh /?/(*! - 2/2)) 

Xl J/2 

+ E E < (^1; x i)"x 2 (^2; a*) (cosh/3/( yi + j/, )) , (A5) 

xi x 2 

n(3) = EE<( iV i^iK 2 ( iV 2;2/2)(cosh/3/(2/ 1 +2/2)) 
2/1 y2 

+ E E "y 1 (^15 f iK a ( N * x 2 ) (2 cosh /?/(?/! + Z2 + 1) + cosh/3/(2/i - x 2 )) 

J/l x 2 

+ ^y^n i x \N 1 -x l )n^{N 2 -y 2 ) (cosh/?/(a;i + y 2 + 1) + cosh/?/^ + 2/2) 

Xl J/2 

+ cosh/3/(xi - 2/2)) 

+ E E "x 1 (^1 ; »i X 2 (N 2 ; x 2 ) (cosh (3f( Xl + x 2 + 1)) , ( A6) 

xi x 2 

n(4) = EE n y 1 ( iV i'yi) n x 2 (^ 2 ;^2)(cosh/3/(2/i+x2 + l)+cosh/3/(2/i-x2)) 

J/l x 2 

+ EE n x 1 (^i^iK 2 (^2;2/ 2 ) (cosh 0f( Xl +2/2 + 1) +cosh/3/(x 1 - 2/2)) , (A7) 

xi yi 
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n(5) = ~ ^<(^i;yi)(cosh/3/(y 1 +iV 2 + l) + cosh/3/(y 1 +iV 2 )+cosli/3/y ] 

- Vl 

+ ^2n s x 1 (Ni;x 1 ) (cosh/3/(xi + N 2 + 1) +coshf3f(x 1 + 1) + cosh/3/xi) 

Xl 

+ Z < (7V2; 2/2 ) ( cosh Pf(y* + Ni + l) + cosh f3f(y 2 + iVi) + cosh/?/?,,) 

V2 

+ ^2 n s x 2 (N 2 ;x 2 ) (cosh [3 f(x 2 + Ni + 1) + cosh/3/(a; 2 + 1) + cosh/3/x 2 ) 

X2 

+ J2n( 1 (N 1 ;y 1 ) (cosh/3/( yi + iV 2 + 1)) + £ (JV i; an) (cosh/3/an) 



(2) ; - s > 3. 

n[s,s ;fo,J|/]u = 2 x 



^^^^aiK 2 ^;^) (cosh/3/( yi +x 2 + 1) + cosh/3/( yi - x 2 + 1)) 

3/1 x 2 

+ Z Z< {Ni;x 1 )n s y *{N 2 ;y 2 ) (cosh/J/fo - y 2 ) + cosh/3/(an + y 2 + 1)) 

Xl J/2 

+ Z Z < CNii 2/i)«x 2 (^2; ar 2 ) (cosh/3/(t/i + a: 2 + 1) + cosh/3/( yi - x 2 )) 

Vl X2 

+ Z Z < ( N ^ x ^ n v (cosh/3/(n - ift) + cosh / 3/(x 1 + y 2 + 1)) 

XI V2 

+ EE n ?(*'»K ! ( f2 : 12 ) {coshpf( yi -x 2 ) +cosh/3/(y 1 + z 2 + 1)) 

2/1 x 2 

+ Z Z (^^iK (^2; y 2 ) {coshpf(xt + y 2 + 1) + cosh/3/On - y 2 )) 

3:i V2 

+ Y,J2 n -( N ^ x ^ n v( N ^y^ (cosh P f( Xl -y 2 )+ cosh Pf( Xl +y 2 + 1)) 

Xl V2 

+ Z Z n £ ^K? (^2; a*) (cosh/3/( yi + .t 2 + 1) + cosh/J/Cj/i - x 2 ))] • 



(A8) 



(A9) 



These equations are so boring that we are not ready to list all H[s, s a ; l a , l\f] (added up to 15 cases). If one sets force 
vanishing, all equations will return to Chen and Dill results for hairpin caseEjl. 



APPENDIX B: EXTRAPOLATION TECHNIQUE FOR ONE DIMENSION END-TO-END DISTANCE 
DISTRIBUTION OF N-STEP OPEN SELF-AVOIDING WALK 

The knowledge about the EED of N-step OSAW is important for our studies, especially, when N is larger enough, 
the exact enumeration can not bear. Fortunately, similar issues has been discussed early, where an asymptotic formula 
about the distribution of SAW was fitted^, ^o), In this paper, we follows the similar approach to find the distribution 
density of the OSAW. 

Let Cat be the total number of N-steps OSAWs, and let C x (N;m) be the number of walks which reach m at x 
coordinate. Then EED density is defined as p%(m) = C x (N;m)/CN- According to the density form suggested by 
Domb et ai, p x N (m) is written as 



r , . 0.32070 . 0.11423a; 4 N 



(Bl) 



where < x% > is one dimensional mean-square EED. The mean square < x 2 N > can be solved by extrapolation 



c *~ AT 

technique pM. Here we only give our numerical result, < x 2 N >= 0.75.ZV 1 ' 45 . To computer C x (N;m), we need C 
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from the definition of p x Nl whose asymptotic formula has been provided by Chen and Dill[p5[. Several examples about 
EED density of OSAW are shown in Fig. M 



11=2? 
M = 27 



■o 

a 




x coordinate 



FIG. 15: Comparing extrapolation formula Eq. 
OSAW. The solid line is of 80-step walk. 



Bl 



(lines) with exact enumeration results (symbols) about EED density of 



We note that there is a dip in the value of enumeration at the origin arising from the restriction of no returns to 
the origin. It might be possible to provide more refined approximation |32|, b ut in view of the general errors involved 
in the extrapolation process we have not considered this worth pursuing! 29[. In our studies, because chain molecule 
would like to extend under the force, the approximation is also reasonable. 
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